CD8 TIL projection uncover cells types hidden by transient gene programs
Cell-Cycle, interferon and Tissue-residency gene programs
Background
Cells responding to external cues might overexpress transient gene programs. In scRNA-seq data, clusters will be driven by the strongest variation component, which can be sometimes transient gene programs. These programs such as the cell cycle, might hide the original, potentially more stable cell types.
These are among the main types of programs that classically are being upregulated in scRNA-seq data:
Cell cycle, when cells go through division (eg. MCM5, TOP2A)
IFN response, in response to IFN stimulation (eg. MX1, ISG15)
HSP response, in response to stress (eg. HSPA1, HSPA2)
Activation response, in response to an activation signal, eg TCR engagement (eg. JUN, FOS)
Tissue-residency program, when cells infiltrate and home within tissues (eg. ZNF683, ITGAE)
Note: while some programs are transient (like the cell cycle), some programs might be more permanent / epigenetically fixed. The distinction between transient and permanent might be harder for some programs, like tissue-residency. We argue here that having a well-curated reference, free from many transient gene programs, will help in distinguishing transient from more stable cell states.
Here we will see how we use reference-based projection to uncover the original cell types from Cycling, IFN-activated and Tissue-resident cells using data from Gueguen et al. 2021.
How to identify gene programs?
Coregulated gene sets can also be found using previously defined
literature but also using unsupervised techniques. Gene modules can be
found using PCA, ICA, cNMF, scCoGAPS,
or find_gene_modules from Monocle3,
which runs UMAP on the genes and then groups them into modules using
Louvain clustering. Here, for simplicity, we chose to focus on
literature base markers, which are easily explainable, explicit
programs.
Data setup
Loading the CD8 reference
First, let’s load the CD8 TIL reference by downloading it from Figshare.
# Load the reference
options(timeout = max(900, getOption("timeout")))
#download.file("https://figshare.com/ndownloader/files/38921366", destfile = "CD8T_human_ref_v1.rds")
ref.cd8 <- load.reference.map("CD8T_human_ref_v1.rds")## [1] "Loading Custom Reference Atlas..."
## [1] "Loaded Custom Reference map Human CD8 TILs"
# Setup colors
mycols <- ref.cd8@misc$atlas.palette
# Plotting
DimPlot(ref.cd8, group.by = 'functional.cluster', label = T, repel = T, cols = mycols) + theme(aspect.ratio = 1)We see that the reference is composed only of stable cell types. Indeed, cells expressing highly transient gene programs (Cycling and IFN) were removed when constructing the reference. HSP / activation-high cells were removed by the stringent mitochondrial gene filter applied.
Load Gueguen et al. data
#download.file("https://figshare.com/ndownloader/files/39082049", destfile = "gueguen.cd3.Rds")
gueguen.cd3 <- readRDS("gueguen.cd3.Rds")
gueguen.cd3$seurat_clusters <- Idents(gueguen.cd3)
# DimPlot
DimPlot(gueguen.cd3, order = T, label = T, repel = T) Projection using reference map
The first step is to project using ProjecTILs
ProjecTILs.classifier function. This projection mode will
give us the labels while keeping the original UMAP embeddings.
# Projection
DefaultAssay(gueguen.cd3) <- "RNA"
gueguen.cd3 <- ProjecTILs.classifier(gueguen.cd3, ref = ref.cd8, filter.cells = T, split.by = 'orig.ident', ncores = 6)
table(gueguen.cd3$functional.cluster)##
## CD8.CM CD8.EM CD8.MAIT CD8.NaiveLike CD8.TEMRA
## 2132 3181 147 238 276
## CD8.TEX CD8.TPEX
## 3340 337
# Plotting
DimPlot(gueguen.cd3, group.by = 'functional.cluster', order = T, cols = mycols, label = T, repel = T)Augmenting cell type classifications with gene program scores
To further augment granularity, here we divide the dataset by computing signatures scores using UCell. To this end, we use our collection of useful transcriptional gene signatures: SignatuR, as well as manually curated signatures.
Here, we will use signatures generated for:
G1S cell cycle
G2M cell cycle
IFN response
Tissue residency
library(SignatuR)
signatures <- GetSignature(SignatuR$Hs)
signatures <- signatures[c("cellCycle.G1S","cellCycle.G2M")]
# Using manual signatures for IFN and resident, which work better with UCell than the full signatures from SignatuR package
signatures[["IFN"]] <- c("ISG15","IFI6","IFI44L","MX1")
signatures[["Tissue_Resident"]] <- c("ITGAE")Now let’s compute the scores using UCell.
gueguen.cd3 <- AddModuleScore_UCell(gueguen.cd3, features = signatures, ncores = 8,
assay = "RNA")FeaturePlot(gueguen.cd3, features = paste0(names(signatures), "_UCell"),
order = T, pt.size = 0.3) & scale_color_paletteer_c("pals::coolwarm") ## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
We see that these signatures correlate with the annotation of the original paper. It’s especially clear for the Cycling and IFN signals, that were largely contained into their own cluster (CD4/8-MCM5, CD4/8-TOP2A clusters for cycling, and CD4/8-ISG15 cluster for IFN).
Signatures scores distributions
We can check signatures scores distribution to decide which threshold to put to distinguish negative/positive cells. Here we will select the value of 0.2.
qplot(gueguen.cd3$cellCycle.G1S_UCell, main = "G1S-Cycling") +
qplot(gueguen.cd3$cellCycle.G2M_UCell, main = "G2M-Cycling") + qplot(gueguen.cd3$IFN_UCell, main = "IFN") + qplot(gueguen.cd3$Tissue_Resident_UCell, main = "Tissue Resident") & theme_bw()Using UCell scores, we can split the data into high and
low for each signature. Let’s do the test here with the G1S-cycling
signature.
# subset resident high cells in the CD8 reference
cycling.G1S.high <- subset(gueguen.cd3, subset = cellCycle.G1S_UCell > 0.2)
cycling.G1S.low <- subset(gueguen.cd3, subset = cellCycle.G1S_UCell <= 0.2)
# Set list
cycling.G1S.list <- list(cycling.G1S.low,cycling.G1S.high)
names(cycling.G1S.list) <- c("cycling.G1S.low","cycling.G1S.high")# Radars to check differences between resident / non resident
plot.states.radar(ref = ref.cd8, query = cycling.G1S.list, labels.col = 'functional.cluster', min.cells = 5, genes4radar = c("SELL","LEF1","TCF7","CCR7","KLF2","GZMK", "GZMA", "FCGR3A","FGFBP2","XCL1","XCL2", "KLRB1",'MCM5',"MCM3","TOP2A")) Cycling-G1S high and Cycling-G1S low cells display the same overall phenotypes, but differ in their expression of key cycling genes, upregulated when cells start to cycle, such as MCM5, MCM3 and TOP2A.
Now we can do this for the 4 signatures that we selected at the beginning.
wrap_plots(pll, guides = "collect")As previously shown, CD8.TEX and CD8.TPEX are the most cycling, resident and IFN-responding subsets. Results will vary depending on the signatures chosen, the UCell threshold chosen.
Upset plot to check program co-occurences
Let’s focus on the CD8.TEX population, and see how the different gene programs relate using a UpSet plot, using UpSetR package.
gueguen.cd3.TEX <- subset(gueguen.cd3, subset = functional.cluster == "CD8.TEX")
listInput <- list(IFN = which(gueguen.cd3.TEX$IFN.class == "Positive"),
Cycling.G2M = which(gueguen.cd3.TEX$cellCycle.G2M.class == "Positive"),
Cycling.G1S = which(gueguen.cd3.TEX$cellCycle.G1S.class == "Positive"),
Tissue_resident = which(gueguen.cd3.TEX$Tissue_Resident.class == "Positive"))
upset(fromList(listInput), order.by = "freq")
grid.text("CD8.TEX gene programs overlap",x = 0.65, y=0.95, gp=gpar(fontsize=12))We can see that in CD8.TEX, broadly half of the IFN high cells are also high for the tissue-residency program.
Comparison with the original annotation
In the original study, the authors use Seurat
TransferData function to transfer labels from the
non-cycling clusters to the cycling clusters (‘CD4/8-TOP2A’,
‘CD4/8-MCM5’). With label transfer, Cycling cells are defined as
CD8.LAYN and CD8.GZMH.
Now let’s compare this result with our projection results. First, let’s select the 2 cycling clusters.
gueguen.cycling <- subset(gueguen.cd3, subset = seurat_clusters %in% c('CD4/8-TOP2A', "CD4/8-MCM5"))
# Dimplot
DimPlot(gueguen.cycling, group.by = 'functional.cluster', label = T, repel = T, cols = mycols)## Warning: ggrepel: 1 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
We can have a look at clusters proportions in the original cycling clusters.
plot.statepred.composition(ref.cd8, gueguen.cycling, metric = "Percent") +
ggtitle("Cycling cells") + ylim(0, 75) + theme_bw() + RotatedAxis()Let’s check radar plots to confirm identities, where we included some markers genes for cell cycle: TOP2A, MCM5 and MCM3.
plot.states.radar(ref = ref.cd8, query = gueguen.cycling, labels.col = 'functional.cluster', min.cells = 5, genes4radar = c("SELL","LEF1","TCF7","CCR7","KLF2","GZMK", "GZMA", "FCGR3A","FGFBP2","XCL1","XCL2", "KLRB1",'MCM5',"MCM3","TOP2A"))The radar plots match well between the reference and the query. It seems that we have indeed less differentiated cells in the cycling compartment (Central Memory and Effector Memory). Projection seems more robust than Seurat label transfer to uncover cell types hidden behind transient gene programs.
Session Info
sessionInfo()## R version 4.2.1 (2022-06-23)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Ventura 13.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] grid stats graphics grDevices datasets utils methods
## [8] base
##
## other attached packages:
## [1] SignatuR_0.1.0 UpSetR_1.4.0 cowplot_1.1.1
## [4] plotly_4.10.1 pheatmap_1.0.12 UCell_2.2.0
## [7] paletteer_1.5.0 scales_1.2.1 ggrepel_0.9.2
## [10] gridExtra_2.3 ProjecTILs_3.0.2 patchwork_1.1.2
## [13] GEOquery_2.66.0 Biobase_2.58.0 BiocGenerics_0.44.0
## [16] data.table_1.14.6 STACAS_2.0.0 scGate_1.4.1
## [19] forcats_0.5.2 stringr_1.5.0 dplyr_1.0.10
## [22] purrr_1.0.1 readr_2.1.3 tidyr_1.2.1
## [25] tibble_3.1.8 ggplot2_3.4.0 tidyverse_1.3.2
## [28] SeuratObject_4.1.3 Seurat_4.3.0
##
## loaded via a namespace (and not attached):
## [1] utf8_1.2.2 spatstat.explore_3.0-5
## [3] reticulate_1.27 tidyselect_1.2.0
## [5] htmlwidgets_1.6.1 BiocParallel_1.32.5
## [7] Rtsne_0.16 munsell_0.5.0
## [9] codetools_0.2-18 ica_1.0-3
## [11] umap_0.2.9.0 future_1.30.0
## [13] miniUI_0.1.1.1 withr_2.5.0
## [15] spatstat.random_3.1-2 colorspace_2.1-0
## [17] progressr_0.13.0 highr_0.10
## [19] knitr_1.41 rstudioapi_0.14
## [21] stats4_4.2.1 SingleCellExperiment_1.20.0
## [23] ROCR_1.0-11 tensor_1.5
## [25] listenv_0.9.0 labeling_0.4.2
## [27] MatrixGenerics_1.10.0 GenomeInfoDbData_1.2.9
## [29] polyclip_1.10-4 farver_2.1.1
## [31] parallelly_1.34.0 vctrs_0.5.2
## [33] generics_0.1.3 xfun_0.36
## [35] timechange_0.2.0 R6_2.5.1
## [37] GenomeInfoDb_1.34.7 rmdformats_1.0.4
## [39] pals_1.7 bitops_1.0-7
## [41] spatstat.utils_3.0-1 cachem_1.0.6
## [43] DelayedArray_0.24.0 assertthat_0.2.1
## [45] promises_1.2.0.1 googlesheets4_1.0.1
## [47] gtable_0.3.1 globals_0.16.2
## [49] goftest_1.2-3 rlang_1.0.6
## [51] splines_4.2.1 lazyeval_0.2.2
## [53] gargle_1.2.1 dichromat_2.0-0.1
## [55] prismatic_1.1.1 spatstat.geom_3.0-5
## [57] broom_1.0.2 BiocManager_1.30.19
## [59] yaml_2.3.7 reshape2_1.4.4
## [61] abind_1.4-5 modelr_0.1.10
## [63] backports_1.4.1 httpuv_1.6.8
## [65] tools_4.2.1 bookdown_0.32
## [67] ellipsis_0.3.2 jquerylib_0.1.4
## [69] RColorBrewer_1.1-3 ggridges_0.5.4
## [71] Rcpp_1.0.10 plyr_1.8.8
## [73] zlibbioc_1.44.0 RCurl_1.98-1.9
## [75] openssl_2.0.5 deldir_1.0-6
## [77] pbapply_1.7-0 S4Vectors_0.36.1
## [79] zoo_1.8-11 SummarizedExperiment_1.28.0
## [81] haven_2.5.1 cluster_2.1.4
## [83] fs_1.6.0 magrittr_2.0.3
## [85] RSpectra_0.16-1 scattermore_0.8
## [87] lmtest_0.9-40 reprex_2.0.2
## [89] RANN_2.6.1 googledrive_2.0.0
## [91] fitdistrplus_1.1-8 matrixStats_0.63.0
## [93] hms_1.1.2 mime_0.12
## [95] evaluate_0.20 xtable_1.8-4
## [97] readxl_1.4.1 IRanges_2.32.0
## [99] compiler_4.2.1 maps_3.4.1
## [101] KernSmooth_2.23-20 crayon_1.5.2
## [103] htmltools_0.5.4 later_1.3.0
## [105] tzdb_0.3.0 lubridate_1.9.0
## [107] DBI_1.1.3 dbplyr_2.3.0
## [109] MASS_7.3-58.2 data.tree_1.0.0
## [111] Matrix_1.5-3 cli_3.6.0
## [113] parallel_4.2.1 igraph_1.3.5
## [115] GenomicRanges_1.50.2 pkgconfig_2.0.3
## [117] sp_1.6-0 spatstat.sparse_3.0-0
## [119] xml2_1.3.3 bslib_0.4.2
## [121] XVector_0.38.0 rvest_1.0.3
## [123] digest_0.6.31 pracma_2.4.2
## [125] sctransform_0.3.5 RcppAnnoy_0.0.20
## [127] spatstat.data_3.0-0 rmarkdown_2.20
## [129] cellranger_1.1.0 leiden_0.4.3
## [131] uwot_0.1.14 shiny_1.7.4
## [133] lifecycle_1.0.3 nlme_3.1-161
## [135] jsonlite_1.8.4 BiocNeighbors_1.16.0
## [137] mapproj_1.2.9 askpass_1.1
## [139] viridisLite_0.4.1 limma_3.54.0
## [141] fansi_1.0.4 pillar_1.8.1
## [143] lattice_0.20-45 fastmap_1.1.0
## [145] httr_1.4.4 survival_3.5-0
## [147] glue_1.6.2 png_0.1-8
## [149] stringi_1.7.12 sass_0.4.5
## [151] rematch2_2.1.2 renv_0.15.5
## [153] irlba_2.3.5.1 future.apply_1.10.0